Using the LARA platform to crowdsource a multilingual, multimodal Little Prince
نویسندگان
چکیده
We describe an ongoing project, in which informally organised international consortium is using the open source LARA platform to create multimodal annotated editions of Antoine de Saint-Exupéry’s Le petit prince multiple languages, so far French, English, Italian, Icelandic, Irish, Japanese, Polish, Farsi and Mandarin. versions book include integrated audio translations automatically generated lemma-based concordance, are freely available online. methods used construct various versions. In some cases, work for a given language was simply divided by type, typically with one person adding another recording audio. other we experimented crowdsourcing methods, splitting text into chapter-sized units distribute these annotators, then combining results at end. Finally, report initial classroom study, where French version intermediate-level Australian students French.
منابع مشابه
To Crowdsource or Not To Crowdsource?
Crowdsourcing contests—events to solicit solutions to problems via an open-call format for prizes—have gained ground as a mechanism for organizations to accomplish tasks. This paper uses game-theoretic models to develop design principles for crowdsourcing contests and answer the questions: what types of tasks should be crowdsourced? Under what circumstances? When a single task is to be complete...
متن کاملAnnotating the Little Prince with Chinese AMRs
Abstract Meaning Representation (AMR) is an annotation framework in which the meaning of a full sentence is represented as a rooted, acyclic, directed graph. In this paper, we describe a pilot project in whichMeaning Representation (AMR) is an annotation framework in which the meaning of a full sentence is represented as a rooted, acyclic, directed graph. In this paper, we describe a pilot proj...
متن کاملCrowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic
Arabic is a language with great dialectal variety, with Modern Standard Arabic (MSA) being the only standardized dialect. Spoken Arabic is characterized by frequent code-switching between MSA and Dialectal Arabic (DA). DA varieties are typically differentiated by region, but despite their wide-spread usage, they are under-resourced and lack viable corpora and tools necessary for speech recognit...
متن کاملExtending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications
U-Compare is a UIMA-based workflow construction platform for building natural language processing (NLP) applications from heterogeneous language resources (LRs), without the need for programming skills. U-Compare has been adopted within the context of the METANET Network of Excellence, and over 40 LRs that process 15 European languages have been added to the U-Compare component library. In line...
متن کاملMultilingual Multimodal Language Processing Using Neural Networks
We live in an increasingly multilingual multimodal world where it is common to find multiple views of the same entity across modalities and languages. For example, news articles which get published in multiple languages are essentially different views of the same entity. Similarly, video, audio and multilingual subtitles are multiple views of the same movie clip. Given the proliferation of such...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Beyond Philology
سال: 2022
ISSN: ['2451-1498', '1732-1220']
DOI: https://doi.org/10.26881/bp.2022.1.09